Take Home Exercise 1

Author

Hulwana

1 Overview

1.1 Getting Started

In the code chunk below, p_load() of pacman package is used to install and load the following R packages into R environment:

  • sf is use for importing and handling geospatial data in R,

  • tidyverse is mainly use for wrangling attribute data in R,

  • tmap will be used to prepare cartographic quality chropleth map,

  • spdep will be used to compute spatial weights, global and local spatial autocorrelation statistics, and

  • funModeling will be used for rapid Exploratory Data Analysis

pacman::p_load(sf, tidyverse, tmap, spdep, readr, dplyr, tidyr,funModeling)

1.2 Importing Geospatial Data

In this in-class data, two geospatial datasets will beused, they are:

  • geo_export

  • nga_ADM2

1.2.1 Importing Geospatial Data

First, we are going to import the water point geospatial data (i.e. geo_export) by using the code chunk below.

wp <- st_read(dsn = "data",
                   layer = "geo_export",
                   crs = 4326) %>%
  filter(clean_coun == "Nigeria")

Things to learn from the code chunk above:

  • st_read() of sf package is used to import geo_export shapefile into R environment and save the imported geospatial data into simple feature data table.

  • filter() of dplyr package is used to extract water point records of Nigeria.

Next, write_rds() of readr package is used to save the extracted sf data table (i.e. wp) into an output file in rds data format. The output file is called wp_nga.rds and it is saved in geodata sub-folder.

write_rds(wp, "data/wp_nga.rds")

1.2.2 Import Nigeria LGA Boundary data

Now, we are going to import the LGA boundary data into R environment by using the code chunk below.

nga <- st_read(dsn = "data",
               layer = "nga_admbnda_adm2_osgof_20190417",
               crs = 4326)

Thing to learn from the code chunk above.

  • st_read() of sf package is used to import nga_admbnda_adm2_osgof_20190417 shapefile into R environment and save the imported geospatial data into simple feature data table.

1.3 Data Wrangling

1.3.1 Recoding NA values into string

In the code chunk below, replace_na() is used to recode all the NA values in status_cle field into Unknown.

wp_nga <- read_rds("data/wp_nga.rds") %>%
  dplyr::mutate(status_cle = 
           replace_na(status_cle, "Unknown"))

1.3.2 EDA

In the code chunk below, freq() of funModeling package is used to display the distribution of status_cle field in wp_nga.

freq(data=wp_nga, 
     input = 'status_cle')

The above bar chart provide a brief understanding that the percentage of water-points that are functional in Nigeria is slightly less than 50%. It is crucial thus to dive deeper to determine if there are significant pattern in areas that do not have functional water-points and if the neighbouring areas can support those areas that face scarcity in water supply.

Observe that there are two categories with similar names (i.e. ‘Non-Functional due to dry season’ and ‘Non functional due to dry season’, we will standardize this by changing that later to ‘Non-Functional due to dry season’. We will also group those water-points which are marked ‘Abandoned’ with those that are grouped under ‘Abandoned/Decommissioned’.

wp_nga$status_cle[wp_nga$status_cle == "Non functional due to dry season"] <- "Non-Functional due to dry season"
wp_nga$status_cle[wp_nga$status_cle == "Abandoned"] <- "Abandoned/Decommissioned"

We rerun the above code to get the following chart

freq(data=wp_nga, 
     input = 'status_cle')

Distribution of water-points by status

1.4 Extracting Water Point Data

In this section, we will extract the water point records by using classes in status_cle field.

1.4.1 Extracting functional water point

In the code chunk below, filter() of dplyr is used to select functional water points.

wpt_functional <- wp_nga %>%
  filter(status_cle %in%
           c("Functional", 
             "Functional but not in use",
             "Functional but needs repair"))
freq(data = wpt_functional,
     input = "status_cle")

1.4.2 Extracting non-functional water point

In the code chunk below, filter() of dplyr is used to select non-functional water points.

wpt_nonfunctional <- wp_nga %>%
  filter(status_cle %in%
           c("Abandoned/Decommissioned", 
             "Non-Functional",
             "Non-Functional due to dry season"))
freq(data=wpt_nonfunctional, 
     input = 'status_cle')

1.4.3 Extracting water point with Unknown class

In the code chunk below, filter() of dplyr is used to select water points with unknown status.

wpt_unknown <- wp_nga %>%
  filter(status_cle == "Unknown")

1.5 Performing Point-in-Polygon Count

nga_wp <- nga %>% 
  mutate(`total wpt` = lengths(
    st_intersects(nga, wp_nga))) %>%
  mutate(`wpt functional` = lengths(
    st_intersects(nga, wpt_functional))) %>%
  mutate(`wpt non-functional` = lengths(
    st_intersects(nga, wpt_nonfunctional))) %>%
  mutate(`wpt unknown` = lengths(
    st_intersects(nga, wpt_unknown)))

1.6 Saving the Analytical Data Table

nga_wp <- nga_wp %>%
  mutate(pct_functional = `wpt functional`/`total wpt`) %>%
  mutate(`pct_non-functional` = `wpt non-functional`/`total wpt`) %>%
  select(3:4, 8:10, 15:23)

Things to learn from the code chunk above:

  • mutate() of dplyr package is used to derive two fields namely pct_functional and pct_non-functional.

Now, you have the tidy sf data table subsequent analysis. We will save the sf data table into rds format.

write_rds(nga_wp, "data/nga_wp.rds")

1.7 Visualising the Spatial Distribution of Water Points

nga_wp <- read_rds("data/nga_wp.rds")
total <- qtm(nga_wp, "total wpt")
wp_functional <- qtm(nga_wp, "wpt functional")
wp_nonfunctional <- qtm(nga_wp, "wpt non-functional")
unknown <- qtm(nga_wp, "wpt unknown")

tmap_mode("view")
tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=1, ncol=2)

Based on the above chart, we observe that in terms of functional water-points the north-west zone has the most functional water-points the number of non-functional water-points seems to be scattered all over in Nigeria.

It is interesting to note that while the district Ifelodun has a relatively higher number of functional waterpoints, it also has the highest number of non-functional waterpoints.

In terms of unknown waterpoint statuses it it mostly populated in the north-central zone of Nigeria.

1.8 Summary Statistics of the data

First we will take a look the dataset.

head(nga_wp, n= 10)
Simple feature collection with 10 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 3.005022 ymin: 4.888055 xmax: 13.83477 ymax: 13.71406
Geodetic CRS:  WGS 84
          ADM2_EN ADM2_PCODE                   ADM1_EN ADM1_PCODE ADM0_EN
1       Aba North   NG001001                      Abia      NG001 Nigeria
2       Aba South   NG001002                      Abia      NG001 Nigeria
3          Abadam   NG008001                     Borno      NG008 Nigeria
4           Abaji   NG015001 Federal Capital Territory      NG015 Nigeria
5            Abak   NG003001                 Akwa Ibom      NG003 Nigeria
6       Abakaliki   NG011001                    Ebonyi      NG011 Nigeria
7  Abeokuta North   NG028001                      Ogun      NG028 Nigeria
8  Abeokuta South   NG028002                      Ogun      NG028 Nigeria
9             Abi   NG009001               Cross River      NG009 Nigeria
10    Aboh-Mbaise   NG017001                       Imo      NG017 Nigeria
                       SD_EN SD_PCODE total wpt wpt functional
1                 Abia South  NG00103        17              7
2                 Abia South  NG00103        71             29
3                Borno North  NG00802         0              0
4  Federal Capital Territory  NG01501        57             23
5       Akwa Ibom North West  NG00302        48             23
6               Ebonyi North  NG01103       233             82
7               Ogun Central  NG02801        34             16
8               Ogun Central  NG02801       119             72
9        Cross River Central  NG00901       152             79
10                  Imo East  NG01701        66             18
   wpt non-functional wpt unknown pct_functional pct_non-functional
1                   9           1      0.4117647          0.5294118
2                  35           7      0.4084507          0.4929577
3                   0           0            NaN                NaN
4                  34           0      0.4035088          0.5964912
5                  25           0      0.4791667          0.5208333
6                  42         109      0.3519313          0.1802575
7                  15           3      0.4705882          0.4411765
8                  33          14      0.6050420          0.2773109
9                  62          11      0.5197368          0.4078947
10                 26          22      0.2727273          0.3939394
                         geometry
1  MULTIPOLYGON (((7.401109 5....
2  MULTIPOLYGON (((7.387495 5....
3  MULTIPOLYGON (((13.83477 13...
4  MULTIPOLYGON (((7.045872 9....
5  MULTIPOLYGON (((7.811244 5....
6  MULTIPOLYGON (((8.4109 6.28...
7  MULTIPOLYGON (((3.143903 7....
8  MULTIPOLYGON (((3.399307 7....
9  MULTIPOLYGON (((8.153282 5....
10 MULTIPOLYGON (((7.321909 5....

1.8.1 Top 10 areas with the most functional waterpoints by state or federal capital territory

nga_state <- nga_wp %>%
  group_by(ADM1_EN) %>%
  summarise(total_functional = sum(`wpt functional`))
  # dplyr::top_n(10, `wpt functional`) %>%
  # dplyr::select(ADM1_EN, `wpt functional`)
nga_state
Simple feature collection with 37 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84
# A tibble: 37 × 3
   ADM1_EN     total_functional                                         geometry
   <chr>                  <int>                                   <GEOMETRY [°]>
 1 Abia                     297 POLYGON ((7.497577 5.90766, 7.518662 5.908043, …
 2 Adamawa                  194 POLYGON ((12.82164 8.947348, 12.82638 8.949684,…
 3 Akwa Ibom                571 MULTIPOLYGON (((7.530807 5.150259, 7.531415 5.1…
 4 Anambra                  246 POLYGON ((6.762889 6.185386, 6.767301 6.181836,…
 5 Bauchi                  2534 POLYGON ((9.965748 9.773833, 9.983683 9.769546,…
 6 Bayelsa                  102 POLYGON ((6.541175 5.295973, 6.548624 5.302347,…
 7 Benue                   1356 POLYGON ((9.055993 8.077275, 9.050176 8.075202,…
 8 Borno                    423 POLYGON ((14.58718 11.75277, 14.58861 11.75334,…
 9 Cross River             1152 MULTIPOLYGON (((8.818036 5.693561, 8.815814 5.7…
10 Delta                    265 POLYGON ((5.985599 5.124185, 5.99217 5.117613, …
# … with 27 more rows

Plot by state

tmap_mode("view")
tm_shape(nga_state)+
# tm_polygons(col = "orange",
#            size = 2,
#            border.col = "black",
#            border.lwd = 1) +
tm_fill("total_functional") +
tm_borders()
plot(nga_wp)

1.9 Transforming the projection of preschool from wgs84 to svy21

In geospatial analytics, it is very common for us to transform the original data from geographic coordinate system to projected coordinate system. This is because geographic coordinate system is not appropriate if the analysis need to use distance or/and area measurements.

The print below reveals that the assigned coordinates system is WGS 84, the 'World Geodetic System 1984' which is inappropriate in our case and should be using the CRS of Nigeria with an ESPG code of either 26391, 26392, and 26303. A country's epsg code can be obtained by referring to epsg.io.

We will just use the EPSG code of 26391 in our analysis.

st_geometry(nga_wp)
Geometry set for 774 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84
First 5 geometries:

Therefore, we need to reproject nga_wp from one coordinate system to another coordinate system mathemetically using the st_transform function of the sf package, as shown by the code chunk below.

nga_wp26391 <- st_transform(nga_wp, crs = 26391)

Next, we will view the content of nga_wp26391 sf data frame as shown below.

st_geometry(nga_wp26391)
Geometry set for 774 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 28879.72 ymin: 30292.37 xmax: 1343798 ymax: 1094244
Projected CRS: Minna / Nigeria West Belt
First 5 geometries:

Notice that instead of Geodetic CRS it has been changed to a Projected CRS of Minna / Nigeria West Belt.

Limitations/ Further work

For future work to consider demarcate the different regions in Nigeria as outline below to understand better if certain region faced water shortage more severely than other regions.